Feat(ai-gemini): Gemini Realtime Adapter#405
Feat(ai-gemini): Gemini Realtime Adapter#405nikas-belogolov wants to merge 7 commits intoTanStack:mainfrom
Conversation
📝 Walkthrough🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There should be some discussion around how to implement the websocket connection, as there could be client-server (straight to the realtime API) connection, or client-server-server, with the web server (e.g. tanstack start) being a proxy |
|
| Command | Status | Duration | Result |
|---|---|---|---|
nx affected --targets=test:sherif,test:knip,tes... |
❌ Failed | 2m 47s | View ↗ |
nx run-many --targets=build --exclude=examples/** |
❌ Failed | 48s | View ↗ |
☁️ Nx Cloud last updated this comment at 2026-03-29 10:28:30 UTC
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/typescript/ai-client/src/realtime-client.ts (1)
532-540:⚠️ Potential issue | 🟡 MinorInconsistent tool mapping:
applySessionConfigomitsoutputSchema.The tool mapping in
connect()(lines 104-114) includes bothinputSchemaandoutputSchema, butapplySessionConfig()only includesinputSchema. This inconsistency could lead to missing output schemas when the session is updated after initial connection.🐛 Proposed fix for consistency
const toolsConfig = tools ? Array.from(this.clientTools.values()).map((t) => ({ name: t.name, description: t.description, inputSchema: t.inputSchema ? convertSchemaToJsonSchema(t.inputSchema) : undefined, + outputSchema: t.outputSchema + ? convertSchemaToJsonSchema(t.outputSchema) + : undefined, })) : undefined🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 532 - 540, The applySessionConfig mapping for tools omits outputSchema, causing inconsistency with connect(); update applySessionConfig (the code building toolsConfig from this.clientTools.values()) to include outputSchema the same way inputSchema is handled by calling convertSchemaToJsonSchema on t.outputSchema when present, so toolsConfig contains both inputSchema and outputSchema (mirroring the mapping in connect()).
🧹 Nitpick comments (3)
packages/typescript/ai-client/src/realtime-client.ts (1)
99-102: Remove commented-out code.This dead code should be removed to keep the codebase clean.
🧹 Proposed removal
- // const toolsList = - // this.clientTools.size > 0 - // ? Array.from(this.clientTools.values()) - // : undefined -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 99 - 102, Remove the dead/commented-out block that defines toolsList in realtime-client.ts (the lines referencing toolsList and this.clientTools) — delete the commented code entirely so no leftover commented declarations remain; ensure there are no other references to the removed snippet (search for toolsList and usages of this.clientTools in the surrounding code) and run lint/format to keep the file clean.packages/typescript/ai-gemini/src/index.ts (1)
86-90: Consider exporting Gemini realtime types from the main entry point.The
./realtime/indexmodule exports types (GeminiRealtimeModel,GeminiRealtimeTokenOptions,GeminiRealtimeOptions) that aren't re-exported here. Other adapters (text, summarize, image, tts) export their configuration types from the main entry point for consumer convenience.♻️ Proposed addition for type exports
// Realtime adapter export { geminiRealtime, geminiRealtimeToken, } from './realtime/index' +export type { + GeminiRealtimeModel, + GeminiRealtimeTokenOptions, + GeminiRealtimeOptions, +} from './realtime/index'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/index.ts` around lines 86 - 90, Export the realtime types from the main entry point so consumers can import them alongside geminiRealtime and geminiRealtimeToken; add a type-only re-export like "export type { GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions } from './realtime/index'" in the same file that currently exports geminiRealtime and geminiRealtimeToken. Ensure you use a type-only export to avoid runtime bundle changes and reference the exact type names GeminiRealtimeModel, GeminiRealtimeTokenOptions, and GeminiRealtimeOptions so IDEs and consumers can import them directly.examples/ts-react-chat/src/routes/realtime.tsx (1)
279-293: Consider enabling additional configuration options for Gemini.The tools indicator now correctly appears for Gemini. However, other session configuration options (output mode, temperature, semantic eagerness) remain OpenAI-only in the UI, but the underlying
RealtimeSessionConfigsupports these for all providers. Consider enabling some of these controls for Gemini if the Gemini Live API supports them.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/realtime.tsx` around lines 279 - 293, The UI currently gates several session configuration controls to provider === 'openai'; update those conditionals to also include 'gemini' so that controls for output mode, temperature, and semanticEagerness are shown when provider === 'gemini' (same as how you added the tools indicator for Gemini). Locate the conditional checks and UI blocks in the realtime component that reference provider (and the controls bound to RealtimeSessionConfig such as outputMode, temperature, semanticEagerness) and extend their logic to allow 'gemini'; ensure the form bindings still map to RealtimeSessionConfig fields so changes propagate to the session payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/ts-react-chat/src/lib/use-realtime.ts`:
- Around line 65-69: The Gemini branch is using shared OpenAI defaults (voice:
'alloy' and unconstrained responseModalities) which Gemini Live rejects; update
the geminiRealtime() path so it supplies Gemini-compatible defaults: set voice
to a valid Gemini prebuilt name (e.g., 'Puck' or another supported name) instead
of 'alloy', and set responseModalities to an explicit single-element array
(e.g., [Modality.AUDIO]) for the session; ensure geminiRealtime (or its caller
in useRealtime) overrides the shared defaults and enforces only one modality so
the Gemini connection will succeed.
In `@packages/typescript/ai-client/src/realtime-types.ts`:
- Around line 28-34: The ElevenLabs realtime adapter's connect function still
uses the old signature with clientToolDefs?: ReadonlyArray<AnyClientTool>, which
no longer matches the RealtimeAdapter.connect signature (token: RealtimeToken,
config: RealtimeSessionConfig) and causes type errors; update the connect method
in packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (the function named
connect or the class implementing RealtimeAdapter) to accept (token:
RealtimeToken, config: RealtimeSessionConfig) and remove the legacy
clientToolDefs parameter, update any internal usages/calls within that adapter
to read configuration from the provided RealtimeSessionConfig, and ensure the
exported adapter type still satisfies RealtimeAdapter.
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 96-112: liveConfig is missing transcription flags so Gemini won't
populate inputTranscription/outputTranscription; update the LiveConnectConfig
object built in the liveConfig variable to include inputAudioTranscription: {}
and outputAudioTranscription: {} (use the same config.providerOptions merge
pattern so you don't overwrite existing keys) so the API will emit transcription
data that the adapter reads when handling inputTranscription/outputTranscription
events.
- Around line 466-473: The sendImage function currently calls
session.sendRealtimeInput with a non-supported media property; update sendImage
to pass the image under the modality-specific video field instead of media or
deprecated mediaChunks (e.g., call session.sendRealtimeInput({ video: { /*
include the image bytes and mimeType as the frame payload */ } })). Locate
sendImage and replace the media object with a video object containing the image
data and mimeType in the API's expected frame/payload shape so the
session.sendRealtimeInput call uses video rather than media or mediaChunks.
- Around line 179-185: The convertFloat32ToInt16 function currently returns
buf.toString() (comma-separated integers); change it to produce a base64-encoded
string of the raw 16-bit PCM bytes suitable for the Gemini Live API audio.data
field (e.g., MIME audio/pcm;rate=16000). Convert the Float32Array to an
Int16Array (clamping and scaling as done now), then create a Uint8Array view
over the Int16Array's buffer (ensuring correct endianness), encode that byte
array to base64, and return the base64 string so audio.data contains raw PCM
bytes rather than CSV integers.
- Around line 188-190: The GoogleGenAI client is being constructed with an
ephemeral token (token.token) but missing the v1alpha API version; update the
initialization of GoogleGenAI (the ai instance) to include httpOptions: {
apiVersion: 'v1alpha' } when using ephemeral tokens so Live API requests use the
v1alpha endpoint, e.g., detect where new GoogleGenAI({ apiKey: token.token }) is
created and add the httpOptions.apiVersion flag.
In `@packages/typescript/ai-gemini/src/realtime/token.ts`:
- Around line 41-42: The default model string is invalid for the
GeminiRealtimeModel union: change the default assigned to the local variable
model (currently set via options.model ?? 'gemini-live-2.5-flash-native-audio')
to one of the allowed union values (e.g., 'gemini-3.1-flash-live-preview' or
'gemini-2.5-flash-native-audio-preview-12-2025') so the inferred type matches
GeminiRealtimeModel; update the expression using options.model to fall back to a
valid GeminiRealtimeModel literal.
---
Outside diff comments:
In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 532-540: The applySessionConfig mapping for tools omits
outputSchema, causing inconsistency with connect(); update applySessionConfig
(the code building toolsConfig from this.clientTools.values()) to include
outputSchema the same way inputSchema is handled by calling
convertSchemaToJsonSchema on t.outputSchema when present, so toolsConfig
contains both inputSchema and outputSchema (mirroring the mapping in connect()).
---
Nitpick comments:
In `@examples/ts-react-chat/src/routes/realtime.tsx`:
- Around line 279-293: The UI currently gates several session configuration
controls to provider === 'openai'; update those conditionals to also include
'gemini' so that controls for output mode, temperature, and semanticEagerness
are shown when provider === 'gemini' (same as how you added the tools indicator
for Gemini). Locate the conditional checks and UI blocks in the realtime
component that reference provider (and the controls bound to
RealtimeSessionConfig such as outputMode, temperature, semanticEagerness) and
extend their logic to allow 'gemini'; ensure the form bindings still map to
RealtimeSessionConfig fields so changes propagate to the session payload.
In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 99-102: Remove the dead/commented-out block that defines toolsList
in realtime-client.ts (the lines referencing toolsList and this.clientTools) —
delete the commented code entirely so no leftover commented declarations remain;
ensure there are no other references to the removed snippet (search for
toolsList and usages of this.clientTools in the surrounding code) and run
lint/format to keep the file clean.
In `@packages/typescript/ai-gemini/src/index.ts`:
- Around line 86-90: Export the realtime types from the main entry point so
consumers can import them alongside geminiRealtime and geminiRealtimeToken; add
a type-only re-export like "export type { GeminiRealtimeModel,
GeminiRealtimeTokenOptions, GeminiRealtimeOptions } from './realtime/index'" in
the same file that currently exports geminiRealtime and geminiRealtimeToken.
Ensure you use a type-only export to avoid runtime bundle changes and reference
the exact type names GeminiRealtimeModel, GeminiRealtimeTokenOptions, and
GeminiRealtimeOptions so IDEs and consumers can import them directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 26ce7bc3-6e10-47ce-adc7-516c0d88c602
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (13)
.changeset/huge-lizards-admire.mdexamples/ts-react-chat/src/lib/use-realtime.tsexamples/ts-react-chat/src/routes/realtime.tsxpackages/typescript/ai-client/src/realtime-client.tspackages/typescript/ai-client/src/realtime-types.tspackages/typescript/ai-gemini/package.jsonpackages/typescript/ai-gemini/src/index.tspackages/typescript/ai-gemini/src/realtime/adapter.tspackages/typescript/ai-gemini/src/realtime/index.tspackages/typescript/ai-gemini/src/realtime/token.tspackages/typescript/ai-gemini/src/realtime/types.tspackages/typescript/ai-openai/src/realtime/adapter.tspackages/typescript/ai/src/realtime/types.ts
| provider === 'openai' | ||
| ? openaiRealtime() | ||
| : provider === 'gemini' | ||
| ? geminiRealtime() | ||
| : elevenlabsRealtime() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini Live API JavaScript supported responseModalities per session and supported prebuilt voice names
💡 Result:
In the Gemini Live API JavaScript SDK (@google/genai), supported responseModalities are specified as an array in the session config, e.g., responseModalities: [Modality.AUDIO] or [Modality.TEXT]. These are set per session via LiveConnectConfig and determine the output types (AUDIO and/or TEXT) for that session. Native audio models primarily support AUDIO, while others support combinations including TEXT. Supported prebuilt voice names (used in speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName) include 30 HD voices: Zephyr (Bright), Puck (Upbeat, default), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm), Umbriel (Easy-going), Erinome (Clear), Achird (Friendly), Algieba (Smooth), Algenib (Gravelly), Zubenelgenubi (Casual), Despina (Smooth), Rasalgethi (Informative), Vindemiatrix (Gentle). These are shared with TTS and configurable per session for audio responses.
Citations:
- 1: https://ai.google.dev/gemini-api/docs/live-guide
- 2: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live
- 3: https://ai.google.dev/gemini-api/docs/live-api/capabilities
- 4: https://ai.google.dev/gemini-api/docs/speech-generation
- 5: https://ai.google.dev/api/live
- 6: https://googleapis.github.io/js-genai/release_docs/classes/live.Live.html
- 7: https://firebase.google.com/docs/ai-logic/live-api/configuration
- 8: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/live-api/configure-language-voice
🏁 Script executed:
cat -n examples/ts-react-chat/src/lib/use-realtime.tsRepository: TanStack/ai
Length of output: 3793
Update Gemini adapter to use Gemini-compatible voice and responseModalities configuration.
The geminiRealtime() adapter selected at line 68 is routed through shared defaults below that are incompatible with Gemini Live. Line 92 specifies voice: 'alloy' (an OpenAI voice), but Gemini Live only supports 30 prebuilt voices with names like Puck (default), Kore, Aoede, etc. Additionally, Gemini Live requires responseModalities to be explicitly set per session as an array (e.g., [Modality.AUDIO]) and only supports a single response modality per session, whereas the current config allows unconstrained modalities. The Gemini branch will fail on connection with these defaults.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/lib/use-realtime.ts` around lines 65 - 69, The
Gemini branch is using shared OpenAI defaults (voice: 'alloy' and unconstrained
responseModalities) which Gemini Live rejects; update the geminiRealtime() path
so it supplies Gemini-compatible defaults: set voice to a valid Gemini prebuilt
name (e.g., 'Puck' or another supported name) instead of 'alloy', and set
responseModalities to an explicit single-element array (e.g., [Modality.AUDIO])
for the session; ensure geminiRealtime (or its caller in useRealtime) overrides
the shared defaults and enforces only one modality so the Gemini connection will
succeed.
| const liveConfig: LiveConnectConfig = { | ||
| responseModalities, | ||
| tools: [{ | ||
| functionDeclarations: config.tools | ||
| }], | ||
| speechConfig: { | ||
| voiceConfig: { | ||
| prebuiltVoiceConfig: { | ||
| voiceName: config.voice | ||
| } | ||
| } | ||
| }, | ||
| maxOutputTokens: config.maxOutputTokens !== 'inf' ? config.maxOutputTokens : undefined, | ||
| systemInstruction: config.instructions, | ||
| temperature: config.temperature, | ||
| ...config.providerOptions | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini Live API JavaScript how to enable input and output audio transcription in LiveConnectConfig
💡 Result:
To enable input and output audio transcription in the Gemini Live API using JavaScript with the @google/genai SDK, include inputAudioTranscription: {} and outputAudioTranscription: {} in the config object passed to ai.live.connect. Example code: import { GoogleGenAI, Modality } from '@google/genai'; const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' }); const session = await ai.live.connect({ model: 'gemini-live-2.5-flash-preview', // or appropriate live model config: { responseModalities: [Modality.AUDIO], inputAudioTranscription: {}, // Enables transcription of user input audio outputAudioTranscription: {}, // Enables transcription of model output audio }, callbacks: { onmessage: (message) => { const sc = message.serverContent; if (sc?.inputTranscription) { console.log('Input transcription:', sc.inputTranscription.text); } if (sc?.outputTranscription) { console.log('Output transcription:', sc.outputTranscription.text); } }, // other callbacks... } }); The AudioTranscriptionConfig is an empty object/marker interface; simply setting {} enables the feature. Transcriptions appear in serverContent.inputTranscription and serverContent.outputTranscription of LiveServerMessage events in the onmessage callback. Note: Ensure you're using a recent version of @google/genai (v0.9.0+) where transcription support was added. For Vertex AI, the config is nested under config as well.
Citations:
- 1: https://deepwiki.com/googleapis/js-genai/7.4-live-configuration
- 2: live inputAudioTranscription for inputTranscription server messages? googleapis/js-genai#454
- 3: Live inputTranscription server messages are not sent googleapis/js-genai#478
- 4: Gemini Live API - Transcription finished flag never updates in Javascript SDK googleapis/js-genai#1429
- 5: https://ai.google.dev/gemini-api/docs/live-guide
🏁 Script executed:
wc -l packages/typescript/ai-gemini/src/realtime/adapter.tsRepository: TanStack/ai
Length of output: 112
🏁 Script executed:
cat -n packages/typescript/ai-gemini/src/realtime/adapter.tsRepository: TanStack/ai
Length of output: 21398
Add inputAudioTranscription and outputAudioTranscription to enable transcript events.
The adapter emits transcript events from inputTranscription and outputTranscription (lines 230, 242), but liveConfig never enables transcription in the Gemini Live API config. Without inputAudioTranscription: {} and outputAudioTranscription: {}, the Gemini API will not populate these fields in the response, causing the transcript events to never fire.
Proposed fix
const liveConfig: LiveConnectConfig = {
responseModalities,
+ inputAudioTranscription: {},
+ outputAudioTranscription: {},
tools: [{
functionDeclarations: config.tools
}],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const liveConfig: LiveConnectConfig = { | |
| responseModalities, | |
| tools: [{ | |
| functionDeclarations: config.tools | |
| }], | |
| speechConfig: { | |
| voiceConfig: { | |
| prebuiltVoiceConfig: { | |
| voiceName: config.voice | |
| } | |
| } | |
| }, | |
| maxOutputTokens: config.maxOutputTokens !== 'inf' ? config.maxOutputTokens : undefined, | |
| systemInstruction: config.instructions, | |
| temperature: config.temperature, | |
| ...config.providerOptions | |
| }; | |
| const liveConfig: LiveConnectConfig = { | |
| responseModalities, | |
| inputAudioTranscription: {}, | |
| outputAudioTranscription: {}, | |
| tools: [{ | |
| functionDeclarations: config.tools | |
| }], | |
| speechConfig: { | |
| voiceConfig: { | |
| prebuiltVoiceConfig: { | |
| voiceName: config.voice | |
| } | |
| } | |
| }, | |
| maxOutputTokens: config.maxOutputTokens !== 'inf' ? config.maxOutputTokens : undefined, | |
| systemInstruction: config.instructions, | |
| temperature: config.temperature, | |
| ...config.providerOptions | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 96 - 112,
liveConfig is missing transcription flags so Gemini won't populate
inputTranscription/outputTranscription; update the LiveConnectConfig object
built in the liveConfig variable to include inputAudioTranscription: {} and
outputAudioTranscription: {} (use the same config.providerOptions merge pattern
so you don't overwrite existing keys) so the API will emit transcription data
that the adapter reads when handling inputTranscription/outputTranscription
events.
| function convertFloat32ToInt16(buffer: Float32Array) { | ||
| let l = buffer.length; | ||
| const buf = new Int16Array(l); | ||
| while (l--) { | ||
| buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff; | ||
| } | ||
| return buf.toString(); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
@google/genai session.sendRealtimeInput audio.data expected format for Gemini Live JavaScript
💡 Result:
The expected format for audio.data in session.sendRealtimeInput({ audio: { data: ..., mimeType: ... } }) for the @google/genai JavaScript SDK (Gemini Live) is a base64-encoded string of raw 16-bit PCM audio data (little-endian, typically 16kHz sample rate). Usage example: session.sendRealtimeInput({ audio: { data: audioChunk.toString('base64'), // audioChunk is a Buffer/ArrayBuffer/Uint8Array of raw PCM bytes mimeType: 'audio/pcm;rate=16000' } }); Key details: - Audio format: Raw PCM, 16-bit signed integers, little-endian, mono. - Sample rate: 16kHz native (API resamples if needed). - mimeType: Must include rate, e.g., 'audio/pcm;rate=16000'. - Send in small chunks for real-time streaming. This matches official Google AI for Developers documentation and SDK source patterns.
Citations:
- 1: https://ai.google.dev/gemini-api/docs/live-guide
- 2: https://ai.google.dev/gemini-api/docs/live-api/capabilities
- 3: https://github.com/googleapis/js-genai/blob/89400098/src/live.ts
- 4: https://tessl.io/registry/tessl/npm-google--genai/1.30.0/docs/live.md
🏁 Script executed:
# First, let's check if the file exists and read the relevant section
cat -n packages/typescript/ai-gemini/src/realtime/adapter.ts | sed -n '170,200p'Repository: TanStack/ai
Length of output: 981
🏁 Script executed:
# Search for usages of convertFloat32ToInt16 in the codebase
rg "convertFloat32ToInt16" --type ts --type tsx -B 2 -A 5Repository: TanStack/ai
Length of output: 82
🏁 Script executed:
# Check for any other audio-related functions in the adapter file
rg "audio|pcm|Audio|PCM" packages/typescript/ai-gemini/src/realtime/adapter.ts -nRepository: TanStack/ai
Length of output: 3431
Encode microphone PCM as base64-encoded raw bytes, not comma-separated integers.
Int16Array.prototype.toString() returns a comma-separated decimal string like "12,-44,9", but the Gemini Live API expects the audio.data field to contain a base64-encoded string of raw 16-bit PCM bytes with a MIME type like audio/pcm;rate=16000. Passing the current comma-separated format will cause the audio payload to be rejected or misinterpreted server-side, preventing microphone input from decoding correctly.
🐛 Proposed fix
function convertFloat32ToInt16(buffer: Float32Array) {
- let l = buffer.length;
- const buf = new Int16Array(l);
- while (l--) {
- buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff;
- }
- return buf.toString();
+ const pcm = new Int16Array(buffer.length)
+ for (let i = 0; i < buffer.length; i++) {
+ pcm[i] = Math.min(1, Math.max(-1, buffer[i]!)) * 0x7fff
+ }
+
+ const bytes = new Uint8Array(pcm.buffer)
+ let binary = ''
+ for (const byte of bytes) {
+ binary += String.fromCharCode(byte)
+ }
+ return btoa(binary)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 179 -
185, The convertFloat32ToInt16 function currently returns buf.toString()
(comma-separated integers); change it to produce a base64-encoded string of the
raw 16-bit PCM bytes suitable for the Gemini Live API audio.data field (e.g.,
MIME audio/pcm;rate=16000). Convert the Float32Array to an Int16Array (clamping
and scaling as done now), then create a Uint8Array view over the Int16Array's
buffer (ensuring correct endianness), encode that byte array to base64, and
return the base64 string so audio.data contains raw PCM bytes rather than CSV
integers.
| const ai = new GoogleGenAI({ | ||
| apiKey: token.token | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does @google/genai require httpOptions.apiVersion = 'v1alpha' when connecting to Gemini Live with an ephemeral token?
💡 Result:
Yes, @google/genai requires httpOptions.apiVersion = 'v1alpha' when connecting to Gemini Live with an ephemeral token. Ephemeral tokens are exclusively compatible with the Live API and mandate the v1alpha API version. Official documentation explicitly states that when using an ephemeral token as the apiKey for client initialization, the http_options must be set to {'api_version': 'v1alpha'}. This is also required for Live API preview features like affective dialog and proactive audio, and for creating the ephemeral tokens themselves.
Citations:
- 1: https://ai.google.dev/gemini-api/docs/live-api/capabilities
- 2: https://ai.google.dev/gemini-api/docs/live-api/ephemeral-tokens
- 3: https://ai.google.dev/gemini-api/docs/ephemeral-tokens
- 4: https://googleapis-python-genai-70.mintlify.app/guides/http-options
- 5: https://ai.google.dev/gemini-api/docs/api-versions
🏁 Script executed:
# First, let's check the file structure and the specific lines mentioned
head -n 200 packages/typescript/ai-gemini/src/realtime/adapter.ts | tail -n 20Repository: TanStack/ai
Length of output: 520
🏁 Script executed:
# Let's also search for other GoogleGenAI instantiations in the codebase
rg "new GoogleGenAI" -A 3 packages/typescript/ai-gemini/Repository: TanStack/ai
Length of output: 873
🏁 Script executed:
# Check the `@google/genai` version in the project to verify SDK compatibility
cat packages/typescript/ai-gemini/package.json | grep -A 5 -B 5 "@google/genai"Repository: TanStack/ai
Length of output: 264
🏁 Script executed:
# Check more context around the GoogleGenAI initialization to verify it's for Live API
sed -n '180,210p' packages/typescript/ai-gemini/src/realtime/adapter.tsRepository: TanStack/ai
Length of output: 907
🏁 Script executed:
# Also check the token.ts file to understand the different use case
cat -n packages/typescript/ai-gemini/src/realtime/token.ts | head -n 50Repository: TanStack/ai
Length of output: 1966
Add v1alpha API version when initializing GoogleGenAI with ephemeral token.
Ephemeral tokens for Gemini Live API are v1alpha-only. When using an ephemeral token as the API key, the client must be initialized with httpOptions: { apiVersion: 'v1alpha' }. Without this configuration, Live API requests will route to the wrong API version and fail.
Proposed fix
const ai = new GoogleGenAI({
- apiKey: token.token
+ apiKey: token.token,
+ httpOptions: {
+ apiVersion: 'v1alpha',
+ },
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const ai = new GoogleGenAI({ | |
| apiKey: token.token | |
| }); | |
| const ai = new GoogleGenAI({ | |
| apiKey: token.token, | |
| httpOptions: { | |
| apiVersion: 'v1alpha', | |
| }, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 188 -
190, The GoogleGenAI client is being constructed with an ephemeral token
(token.token) but missing the v1alpha API version; update the initialization of
GoogleGenAI (the ai instance) to include httpOptions: { apiVersion: 'v1alpha' }
when using ephemeral tokens so Live API requests use the v1alpha endpoint, e.g.,
detect where new GoogleGenAI({ apiKey: token.token }) is created and add the
httpOptions.apiVersion flag.
| sendImage(imageData: string, mimeType: string) { | ||
| // Only accepts raw image data, not URLs | ||
| session.sendRealtimeInput({ | ||
| media: { | ||
| data: imageData, | ||
| mimeType: mimeType, | ||
| } | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini Live API JavaScript realtime image input mediaChunks vs media field
💡 Result:
In the Gemini Live API for JavaScript (using @google/genai SDK), realtime image input is sent via the session.sendRealtimeInput method. Images (for vision/video at <=1 FPS, JPEG recommended) are provided as a Blob object with base64-encoded data and mimeType like 'image/jpeg'. Example from official docs: session.sendRealtimeInput({ video: { // Use 'video' field for image frames data: frame.toString('base64'), mimeType: 'image/jpeg' } }); The underlying protocol (BidiGenerateContentRealtimeInput) previously used a 'mediaChunks[]' array of Blobs for generic media input, but this is DEPRECATED. Now use modality-specific fields: 'audio' for PCM audio chunks, 'video' for image frames, or 'text' for text. Multiple mediaChunks are ignored beyond the first, and the new fields provide clearer typing and streaming support. For non-realtime (context-building), use sendClientContent with inlineData parts, e.g.: session.sendClientContent({ turns: [{ parts: [{ inlineData: { mimeType: 'image/jpeg', data: base64Image } }] }] }); No distinct 'media' field exists in current realtime input; 'video' handles images. Use sendRealtimeInput for continuous streaming (optimized for low latency, non-deterministic order), sendClientContent for ordered context addition.
Citations:
- 1: https://ai.google.dev/gemini-api/docs/live-api/capabilities
- 2: https://ai.google.dev/gemini-api/docs/live-guide
- 3: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/live-api/send-audio-video-streams
- 4: https://ai.google.dev/api/live
- 5: https://github.com/googleapis/js-genai/blob/89400098/src/live.ts
- 6: https://github.com/googleapis/js-genai/blob/main/sdk-samples/live_client_content.ts
- 7: https://tessl.io/registry/tessl/npm-google--genai/1.30.0/files/docs/live.md
sendImage must use the video field, not media or deprecated mediaChunks.
The sendRealtimeInput() method does not accept a media property. For image frames, use the video field instead. The mediaChunks field (suggested in the diff below) is deprecated and should not be used; the current API uses modality-specific fields: video for image frames, audio for PCM audio, and text for text.
🐛 Proposed fix
sendImage(imageData: string, mimeType: string) {
- // Only accepts raw image data, not URLs
session.sendRealtimeInput({
- media: {
- data: imageData,
- mimeType: mimeType,
- }
+ video: {
+ data: imageData,
+ mimeType,
+ },
})
currentMode = 'thinking'
emit('mode_change', { mode: 'thinking' })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sendImage(imageData: string, mimeType: string) { | |
| // Only accepts raw image data, not URLs | |
| session.sendRealtimeInput({ | |
| media: { | |
| data: imageData, | |
| mimeType: mimeType, | |
| } | |
| }) | |
| sendImage(imageData: string, mimeType: string) { | |
| session.sendRealtimeInput({ | |
| video: { | |
| data: imageData, | |
| mimeType, | |
| }, | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 466 -
473, The sendImage function currently calls session.sendRealtimeInput with a
non-supported media property; update sendImage to pass the image under the
modality-specific video field instead of media or deprecated mediaChunks (e.g.,
call session.sendRealtimeInput({ video: { /* include the image bytes and
mimeType as the frame payload */ } })). Locate sendImage and replace the media
object with a video object containing the image data and mimeType in the API's
expected frame/payload shape so the session.sendRealtimeInput call uses video
rather than media or mediaChunks.
| const model: GeminiRealtimeModel = | ||
| options.model ?? 'gemini-live-2.5-flash-native-audio' |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Default model literal used by token adapter:"
rg -n "gemini-live-2\.5-flash-native-audio|gemini-2\.5-flash-native-audio-preview-12-2025" -- packages/typescript/ai-gemini/src/realtime
echo
echo "Declared GeminiRealtimeModel union:"
sed -n '40,60p' packages/typescript/ai-gemini/src/realtime/types.tsRepository: TanStack/ai
Length of output: 1293
Fix the default Gemini model ID to match the declared type.
options.model ?? 'gemini-live-2.5-flash-native-audio' assigns a string literal that is not in the GeminiRealtimeModel union type. The declared union only includes 'gemini-3.1-flash-live-preview' and 'gemini-2.5-flash-native-audio-preview-12-2025'. Replace with the valid model ID:
Fix
const model: GeminiRealtimeModel =
- options.model ?? 'gemini-live-2.5-flash-native-audio'
+ options.model ?? 'gemini-2.5-flash-native-audio-preview-12-2025'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const model: GeminiRealtimeModel = | |
| options.model ?? 'gemini-live-2.5-flash-native-audio' | |
| const model: GeminiRealtimeModel = | |
| options.model ?? 'gemini-2.5-flash-native-audio-preview-12-2025' |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/realtime/token.ts` around lines 41 - 42,
The default model string is invalid for the GeminiRealtimeModel union: change
the default assigned to the local variable model (currently set via
options.model ?? 'gemini-live-2.5-flash-native-audio') to one of the allowed
union values (e.g., 'gemini-3.1-flash-live-preview' or
'gemini-2.5-flash-native-audio-preview-12-2025') so the inferred type matches
GeminiRealtimeModel; update the expression using options.model to fall back to a
valid GeminiRealtimeModel literal.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (1)
35-47:⚠️ Potential issue | 🟠 MajorWire
optionsandconfigparameters to session initialization.The
connect()method accepts bothoptions(fromelevenlabsRealtime()) and_config(RealtimeSessionConfig) but passes neither tocreateElevenLabsConnection(), causing caller-provided configuration to be silently dropped. WhileElevenLabsRealtimeOptionsincludes meaningful fields (connectionMode,debug), and the Gemini adapter demonstrates proper config wiring, the ElevenLabs implementation ignores both parameters. At minimum, rename_configtoconfigto reflect its intended use and thread it through to session creation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts` around lines 35 - 47, The connect implementation of elevenlabsRealtime is dropping the caller-provided ElevenLabsRealtimeOptions and RealtimeSessionConfig; rename the unused parameter _config to config in the connect signature and thread both options (from elevenlabsRealtime) and config (RealtimeSessionConfig) into createElevenLabsConnection so session initialization receives connectionMode/debug and session config; update the call sites inside elevenlabsRealtime.connect to pass (token, config, options, clientToolDefs) or the equivalent parameter order expected by createElevenLabsConnection and adjust createElevenLabsConnection invocation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts`:
- Around line 35-47: The connect implementation of elevenlabsRealtime is
dropping the caller-provided ElevenLabsRealtimeOptions and
RealtimeSessionConfig; rename the unused parameter _config to config in the
connect signature and thread both options (from elevenlabsRealtime) and config
(RealtimeSessionConfig) into createElevenLabsConnection so session
initialization receives connectionMode/debug and session config; update the call
sites inside elevenlabsRealtime.connect to pass (token, config, options,
clientToolDefs) or the equivalent parameter order expected by
createElevenLabsConnection and adjust createElevenLabsConnection invocation
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9ecd1b6-aed5-4bb5-81bc-c287adfd2e67
📒 Files selected for processing (5)
packages/typescript/ai-client/src/realtime-client.tspackages/typescript/ai-client/src/realtime-types.tspackages/typescript/ai-elevenlabs/src/realtime/adapter.tspackages/typescript/ai-gemini/src/realtime/adapter.tspackages/typescript/ai-openai/src/realtime/adapter.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/typescript/ai-client/src/realtime-client.ts
- packages/typescript/ai-client/src/realtime-types.ts
- packages/typescript/ai-openai/src/realtime/adapter.ts

🎯 Changes
✅ Checklist
pnpm run test:pr.🚀 Release Impact
Summary by CodeRabbit